Model Selection

Cross-modal image understanding

# Cross-modal image understanding

Nllb Clip Large Oc

NLLB-CLIP is a multilingual vision-language model combining the NLLB model's text encoder with CLIP's image encoder, supporting 201 languages.

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase